Conditional entropy is a measure of the amount of impurity, uncertainty or randomness remaining in a random variable given that another random variable is known.
In the context of classification problems, conditional entropy quantifies the uncertainty of a target variable \( Y \), which describes the set of class labels, given a category/value \( x \) of the attribute \( X \).
For a binary classification problem, conditional entropy \( H(Y \mid X) \) is calculated using the following formula:
\[ H(Y \mid X) = \sum_{x \in X} p(x) H(Y \mid X = x) \]
Where:
Calculating the conditional entropies for the individual categories/values \( x \) of attribute \( X \) (depicted as \( H(Category) \) in the calculator below) is possible with the following formula:
\[ H(Y \mid X = x) = - \sum_{y \in Y} p(y \mid x)\log_2(p(y \mid x)) \]
Where:
It sums over all the possible categories/values of the class attribute \( Y \).
For example, consider the class (\(Y\)) sample space: \(\text{Play golf} = \{\text{yes}, \text{no}\}\), and the attribute (\(X\)) sample space: \(\text{Outlook} = \{\text{Sunny}, \text{Rainy}\}\). The conditional entropy is expressed as:
\[ H(\text{Play golf} \mid \text{Outlook}) = \sum_{x \in \text{Outlook}} p(x) H(\text{Play golf} \mid \text{Outlook} = x) \]
Specifically, for the category "Sunny," we can calculate:
\[ H(\text{Play golf} \mid \text{Outlook} = \text{Sunny}) = - \sum_{y \in \text{Play golf}} p(y \mid \text{Sunny}) \log_2(p(y \mid \text{Sunny})) \]
Conditional Entropy calculator
Class 1 | Class 2 | Ratio | \(H(Y \mid X=x)\) | \(H(Y \mid X)\) | ||
---|---|---|---|---|---|---|